This paper introduces object-oriented attribute grammars.
These can be characterized as a notation for
all classes of attribute grammars. Based on a subtype relation
between grammar rules, inheritance of attributes and attribute computations
are defined. With this approach, attributes local to grammar rules and the
elimination of chain rules are possible without any special constructs.
We present object-oriented attribute grammars by a formal definition and by a few typical
examples. They are compared to the concepts of related areas. We conclude by sketching an
implementation of object-oriented attribute grammars as specification language of an
attribute evaluator generator called
.i Ag
which processes ordered attribute grammars (OAGs) and higher order
attribute grammars (HAGs). A first realistic application showed that the
generated attribute evaluators are very efficient and can be used in
production quality systems.
.sh 1 Introduction
.pp
This paper defines a notation for attribute grammars called object-oriented
attribute grammars. The notation can be used to describe concrete syntax,
abstract syntax, and attribute grammars in a uniform way.
Our original research goal was to promote the acceptance of attribute
grammars by improving the specification language and by
generating evaluators that are as efficient as possible.
.pp
Many existing attribute grammar systems\*([<\*([[DJL88\*(]]\*(>]
are based on the concrete syntax of the source language.
It has several advantages to base attribute
grammars on the abstract syntax, however. Most of the terminal symbols and chain rules
disappear. This makes the specification shorter and clearer. The abstract syntax can be
optimized towards the task of semantic analysis. Usually, this leads to fewer
grammar rules, fewer attribute computations, and fewer nodes in the structure tree.
The result is a simplification of the attribute grammar as well as a reduction of space
and run time because less nodes have to be stored and visited during attribute evaluation.
Therefore efficiency is increased.
In this paper we assume that attribute evaluation is performed on the basis of an attributed
tree which is stored in memory.
.pp
The roots for our definition of object-oriented attribute grammars can already be found in the
implementation of current attribute grammar systems. In attribute grammars, nonterminals
can be associated
with a set of attributes. There may be several grammar rules having this nonterminal as
left-hand side symbol. The implementation of attributed trees uses a separate node type for
every grammar rule. Now all node types for one nonterminal have to store the attributes of
this nonterminal. This leads to the inheritance of information from a nonterminal to the
associated rules.
Object-oriented attribute grammars generalize this observation. Besides attributes, right-hand
sides and attribute computations can be inherited as well. Inheritance is extended from one
level (from nonterminals to rules) to arbitrary many levels. Attributes local to a rule can
be introduced without any special construct.
.pp
The rest of this paper is organized as follows:
The next section defines object-oriented attribute grammars formally.
Section 3 presents examples of object-oriented attribute grammars using the specification
language of an attribute grammar system.
Section 4 compares object-oriented attribute grammars to the concepts of conventional
attribute grammars, trees, types, and object-oriented programming.
Section 5 shortly sketches an implementation of object-oriented attribute grammars and reports
early experiences from a non-trivial application project.
.sh 1 "Definition"
.pp
This section defines the principles of object-oriented attribute grammars.
As starting point we shortly recall the traditional definition of attribute grammars
\*([[Knu68\*(],Knu71\*(]].
.pp
An attribute grammar is an extension of a context-free grammar.
A context-free grammar is denoted by G = (N, T, P, Z) where
N is the set of nonterminals,
T is the set of terminals,
P is the set of productions, and
Z \(mo N is the
.i start
symbol, which cannot appear on the right-hand side of any production in P.
The set V = N \(cu T is called the vocabulary.
Each production p \(mo P has the form $ p:~X~\(->~alpha $ where X \(mo N and
$ alpha~\(mo~V sup "*" $.
The relation \(:> (directly derives) is defined over strings in $ V sup "*" $ as follows:
if $ p:~X~\(->~alpha $, p \(mo P, $ nu X omega~\(mo~V sup "*" $,
$ nu alpha omega~\(mo~V sup "*" $ then
$ nu X omega~\(:>~nu alpha omega $.
The relation $ \(:> sup "*" $ is the transitive and reflexive closure of \(:>.
The language L(G) is defined as $ L(G)~=~"{"~w~|~Z~\(:> sup *~w~"}" $.
.pp
An attribute grammar augments a context-free grammar by attributes and attribute
computations. A set of attributes is associated with each symbol in V. Attribute computations
are added to each production describing how to compute attribute values.
This simple view of attribute grammars shall suffice for the scope of this paper.
.pp
In general there can be several productions having the same nonterminal on the left-hand side.
This allows for different derivations starting from one nonterminal. In object-oriented
attribute grammars, one production is permitted for one left-hand side symbol, only. This way
the notions production and nonterminal (vocabulary respectively) become the same and are
called
.i "node type"
in the following. Several different derivations are made possible through the newly introduced
subtype relation.
.pp
An object-oriented attribute grammar is formally denoted by G = (N, T, A, C, Z) where
N is the set of nonterminals,
T is the set of terminals,
A is the set of attributes,
C is the set of attribute computations, and
Z is the start symbol (Z \(mo N).
The set NT = N \(cu T is called the set of
.i "node types" .
Each element n \(mo NT is associated with a tuple n: (R, B, D, S) where
$ R~\(mo~NT sup "*" $ is the right-hand side,
$ B~\(mo~A sup "*" $ is the set of attributes,
$ D~\(mo~C sup "*" $ is the set of attribute computations, and
S \(mo NT is the base type.
.pp
The elements of NT induce a relation \(ib (subtype) over NT as follows:
if $ n:~( alpha ,~beta ,~delta ,~m )~\(mo~NT $ then n \(ib m.
m is called
.i base
or
.i super
type, n is called
.i derived
type or
.i subtype .
The relation \(ib is transitive:
if n \(ib m and m \(ib o then n \(ib o.
.pp
The relation \(:> (directly derives) is defined here only for the context-free part of an
object-oriented attribute grammar. There are two possibilities for derivations which are
defined over strings in $ NT sup "*" $ as follows:
.lp
.nf
.ta 3.9c
if $ nu n sub i omega~\(mo~NT sup "*" $ and $ n sub 1 :~( alpha sub 1 ,~beta sub 1 ,~delta sub 1 ,~n sub 0 )~\(mo~NT, $
$ n sub 2 :~( alpha sub 2 ,~beta sub 2 ,~delta sub 2 ,~n sub 1 )~\(mo~NT, $
$ ... $
$ n sub i :~( alpha sub i ,~beta sub i ,~delta sub i ,~n sub i-1 )~\(mo~NT $ then $ nu n sub i omega~\(:>~nu alpha sub 1 alpha sub 2 ... alpha sub i omega $.
.lp
.nf
if $ nu n omega~\(mo~NT sup "*" $ and $ m~\(ib~n $ then $ nu n omega~\(:>~nu m omega $.
.lp
We assume the existence of a predefined node type $ n sub 0 :~(\(o/,~\(o/,~\(o/,~-) $ with
empty components. In a direct derivation step, a node type can be replaced by its right-hand
side $ ( alpha sub 1 ... alpha sub i ) $ or by one of its subtypes (m). All replacing
right-hand sides are the union of right-hand sides according to the subtype hierarchy.
The relation $ \(:> sup "*" $ is the transitive and reflexive closure of \(:>.
The language L(G) is defined as $ L(G)~=~"{"~w~|~Z~\(:> sup "*"~w~"}" $.
.pp
The subtype relation has the following properties: a derived node type inherits the
right-hand side, the attributes, and the attribute computations from its base type. As
consequence of the transitive nature of this relation, a derived type inherits all the
components from all base types according to the subtype hierarchy.
It may extend the set of inherited items by defining
additional right-hand side elements, attributes, or attribute computations. All accumulated
right-hand side elements and attributes must be distinct because they are
united. An attribute computation for an attribute may overwrite an inherited one.
The definition of the subtype relation allows exactly single inheritance.
.sh 1 Examples
.pp
We implemented an attribute grammar system called
.i Ag
based on object-oriented attribute grammars\*([<\*([[GrE90\*(]]\*(>]. The following
examples of object-oriented attribute grammars are given in the specification language of
.i Ag .
The language tries to adhere to the conventional style of grammars as far as possible.
It offers far more features for practical usage than can be explained here. The interested
reader is referred to the user's manual\*([<\*([[Gro89b\*(]]\*(>].
.(b
Example 1:
.sp 0.5
.FT
Expr = <
Add = Lop: Expr '+' Rop: Expr .
Sub = Lop: Expr '-' Rop: Expr .
Const = Integer .
> .
Integer : .
\&'+' : .
\&'-' : .
.)b
.pp
Example 1 describes the concrete syntax of primitive expressions. It defines four nonterminal
node types (Expr, Add, Sub, and Const) and 3 terminal node types (Integer, '+', and '-').
The distinction between nonterminals and terminals is based on the characters '=' and ':'.
Terminal node types without attributes do not have to be defined explicitly because
undefined node types are defined implicitly as terminals. Node types can be named by
identifiers or strings. In this example only the right-hand side components and the subtype
relation are used - attributes and attribute computations do not appear. Expr has zero, Add
and Sub have three, and Const has one right-hand side element(s) or children. A child consists
of a selector name and node type. Selector names are added to allow unambiguous access to
children. Missing selector names are implicitly defined to be equal to the name of the node
type. The children of the node type Add have the selector names Lop, '+', and Rop.
Their node types are Expr, '+', and Expr. The node type Const has one child with
the selector name Integer of type Integer.
The subtype relation is expressed by enclosing all subtypes of a base type in < > brackets.